Skip to content

feat(nightwatch): add dual-panel evaluation for drift detection#721

Merged
AlexMikhalev merged 1 commit intomainfrom
task/91-dual-panel-nightwatch
Mar 26, 2026
Merged

feat(nightwatch): add dual-panel evaluation for drift detection#721
AlexMikhalev merged 1 commit intomainfrom
task/91-dual-panel-nightwatch

Conversation

@AlexMikhalev
Copy link
Copy Markdown
Contributor

Summary

  • Adds DualPanelResult struct for dual-panel quality assessment
  • dual_panel_evaluate() runs two independent quality checks (certificate + structure)
  • Drift detected when panels disagree significantly (agreement < 0.5)
  • Builds on ReasoningCertificate validation from PR feat(orchestrator): add ReasoningCertificate type for audit events #716
  • Unit tests for agreement, drift, and missing certificate scenarios

Test plan

  • cargo clippy --workspace --all-targets -- -D warnings passes
  • cargo fmt --all --check passes
  • cargo test --workspace passes

Refs #91

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Add DualPanelResult struct and dual_panel_evaluate function to enable
two independent quality assessments on agent output. Drift is detected
when panel agreement falls below 0.5.

Panel A: Scores based on ReasoningCertificate quality (premises,
claims, edge cases, confidence)
Panel B: Scores based on output structure (sections, evidence markers,
conclusion markers, minimum length)

Includes comprehensive unit tests covering:
- Both panels agree (no drift)
- Panels disagree (drift detected)
- Missing certificate scenario

Refs #91
@AlexMikhalev AlexMikhalev force-pushed the task/91-dual-panel-nightwatch branch from ed6693a to 9cc2a3e Compare March 26, 2026 15:36
@AlexMikhalev AlexMikhalev merged commit b23c17c into main Mar 26, 2026
11 of 12 checks passed
@AlexMikhalev AlexMikhalev deleted the task/91-dual-panel-nightwatch branch March 26, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant